per chi vuole provare a simulare le cose in tempo reale
qr code che manda a questo link https://github.com/sitalaura/link-functions/tree/main/R
oppure scaricare il file a questo percorso sitalaura.github.io/link-functions/R/datasim.R
independent variable: age in years (years)
dependent variable: (variabile)
aggiungi screenshot dataset
using the classical linear predictor
what we dont see it bc its a default parameter but its actually hidden in our code:
the model uses family gaussian and the identity link function
link function in GLMs transforms (re-map) the linear predictor X
to the appropriate range of the response variable Y
equal intervals on X correspond to equal intervals on Y
PLOT DELLA RETTA su x ed y metti i nomi delle variabili dell’esempio
independent variable: age in years (years)
dependent variable: mistakes in a reading task (errors)
aggiungi screenshot dataset
using the classical linear predictor
fitL = glm(y~age, data=d)
effL = data.frame(allEffects(fitL,xlevels=list(age=seq(min(age),max(age),.05)))$"age")
ggplot(d,aes(x=age,y=y))+
coord_cartesian(ylim=c(-0.4,max(y))) +
geom_point(size=4,alpha=.5,color="darkblue")+
geom_line(data=effL,aes(y=fit),size=2,color="darkred")+
geom_ribbon(data=effL,aes(y=fit,ymin=lower,ymax=upper),alpha=.3,fill="darkred")+
theme(text=element_text(size=ts,color="black"))+
scale_y_continuous(breaks=seq(0,20,2))+scale_x_continuous(breaks=seq(0,20,.5))+
ylab("Errors")+xlab("Age (years)")IN THE FIRST EXAMPLE an identity link was appropriate bc here
x (years) spans from -inf to +inf
y (boh) spans from -inf to +inf
here an identity link is NOT appropriate bc
x (years) spans from -inf to +inf
y (errors) spans from 0 to +inf
equal intervals on X correspond to equal ratios (NOT equal intervals) on Y
PLOT DEL LOGARITMO (O QUELLO APPROPRIATO CON L’ESEMPIO) su x ed y metti i nomi delle variabili dell’esempio
using the appropriate distribution family=poisson
and the appropriate link link="log"
fitP = glm(y~age, family=poisson(link="log"), data=d)
effP = data.frame(allEffects(fitP,xlevels=list(age=seq(min(age),max(age),.05)))$"age")
ggplot(d,aes(x=age,y=y))+
coord_cartesian(ylim=c(-0.4,max(y))) +
geom_point(size=4,alpha=.5,color="darkblue")+
geom_line(data=effP,aes(y=fit),size=2,color="blue")+
geom_ribbon(data=effP,aes(y=fit,ymin=lower,ymax=upper),alpha=.3,fill="blue")+
#geom_line(data=effL,aes(y=fit),size=2,color="darkred")+
#geom_ribbon(data=effL,aes(y=fit,ymin=lower,ymax=upper),alpha=.3,fill="darkred")+
theme(text=element_text(size=ts,color="black"))+
scale_y_continuous(breaks=seq(0,20,2))+scale_x_continuous(breaks=seq(0,20,.5))+
ylab("Errors")+xlab("Age (years)")independent variable: age in years (years)
dependent variable: mistakes in a reading task (errors)
adding a new main effect
groups: normal kids (group = 0) vs kids with dyslexia (group = 1)
an interaction would emerge with a classical linear model (commenta l’output) il link è quello identico mettilo nel codice
using the wrong link
Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
Data: d
Models:
fitL_0: y ~ age + group + (1 | id), zi=~0, disp=~1
fitL_1: y ~ age * group + (1 | id), zi=~0, disp=~1
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
fitL_0 5
fitL_1 6 1320.3 1341.5 -654.17 1308.3 1
ts <- 16
tc <- "black"
eff_obj <- allEffects(
fitL_1,
xlevels = list(age = seq(min(d$age), max(d$age), .05))
)[["age:group"]]
effL <- as.data.frame(eff_obj)
# garantisci colonna fit
if (!("fit" %in% names(effL))) {
pred_col <- intersect(c("fit", "response", "fitted", "yhat"), names(effL))
if (length(pred_col) == 0)
stop("Nessuna colonna di predizione trovata in effL.")
effL$fit <- effL[[pred_col[1]]]
}
ggplot(d, aes(x = age, y = y, shape = group, color = group)) +
geom_point(size = 4, alpha = .6) +
geom_ribbon(
data = effL,
aes(x = age, ymin = lower, ymax = upper, fill = group, group = group),
alpha = .3,
color = NA,
inherit.aes = FALSE
) +
geom_line(
data = effL,
aes(x = age, y = fit, color = group, group = group),
linewidth = 2,
inherit.aes = FALSE
) +
scale_color_manual(values = c("darkorange3", "darkgreen")) +
scale_fill_manual(values = c("darkorange3", "darkgreen")) +
theme(text = element_text(size = ts, color = tc)) +
scale_x_continuous(breaks = seq(6, 10, .5)) +
labs(y = "Errors", x = "Age (years)")no significant interaction using the correct link
fitP_0 = glmmTMB(y ~ age + group + (1|id), family=poisson(link="log"), data=d)
fitP_1 = glmmTMB(y ~ age * group + (1|id), family=poisson(link="log"), data=d)
anova(fitP_0, fitP_1)Data: d
Models:
fitP_0: y ~ age + group + (1 | id), zi=~0, disp=~1
fitP_1: y ~ age * group + (1 | id), zi=~0, disp=~1
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
fitP_0 4 1210.6 1224.7 -601.31 1202.6
fitP_1 5 1212.5 1230.1 -601.25 1202.5 0.116 1 0.7334
fitP = glm(y~age*group, family=poisson(link="log"), data=d)
effP = data.frame(allEffects(fitP,xlevels=list(age=seq(min(age),max(age),.05)))$"age:group")
effP$group = as.factor(effP$group)
ggplot(d,aes(x=age,y=y,shape=group,color=group))+
geom_point(size=4,alpha=.6)+
scale_color_manual(values=c("darkorange2","darkgreen"))+
scale_fill_manual(values=c("darkorange1","darkgreen"))+
geom_line(data=effP,aes(x=age,y=fit,group=group,linetype=group),size=2)+
geom_ribbon(data=effP,aes(x=age,y=fit,ymin=lower,ymax=upper,group=group,fill=group),alpha=.3,color=NA)+
theme(text=element_text(size=ts,color="black"))+scale_x_continuous(breaks=seq(0,20,.5))+
ylab("Errors")+xlab("Age (years)")type I error arriva da 0.05 a 50% sulle interazioni con la link function sbagliata show code (da fare) grafico
there are many link functions (and families of distributions)
use the right one: otherwise it’s very likely you end up finding something that is not there!
All materials are available on GitHub at sitalaura/link-functions
Questions and feedbacks laura.sita@studenti.unipd.it
if we put the wrong family, but the correct link family=gaussian(link="log") still no interaction (as it should)
fitL_log_0 <- glmmTMB(y ~ age + group + (1|id), family = gaussian(link="log"), data=d, start=list(beta=c(b0, 0, 0)))
fitL_log_1 <- glmmTMB(y ~ age * group + (1|id), family = gaussian(link="log"), data=d, start=list(beta=c(b0, 0, 0, 0)))
anova(fitL_log_0, fitL_log_1)Data: d
Models:
fitL_log_0: y ~ age + group + (1 | id), zi=~0, disp=~1
fitL_log_1: y ~ age * group + (1 | id), zi=~0, disp=~1
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
fitL_log_0 5 1228.8 1246.4 -609.41 1218.8
fitL_log_1 6 1230.3 1251.5 -609.16 1218.3 0.4946 1 0.4819
Cognitive Science Arena 2026